GUBS, a Behavior-based Language for Open System 
Dedicated to Synthetic Biology 



Adrien Basso-Blandin 



Franck Delaplace 

IBISC lab. 



IBISC lab. 



Evry University 
abassoOibisc . univ-evry . f r 



Evry University 
franck. delaplace@ibisc .univ-evry . f r 



In this article, we propose a domain specific language, GUBS (Genomic Unified Behavior Specifi- 
cation), dedicated to the behavioural specification of synthetic biological devices, viewed as discrete 
open dynamical systems. GUBS is a rule-based declarative language. By contrast to a closed system, a 
program is always a partial description of the behaviour of the system. The semantics of the language 
accounts the existence of some hidden non-specified actions that possibly alter the behaviour of the 
programmed devices. The compilation framework follows a scheme similar to automatic theorem 
proving, aiming at improving synthetic biological design safety. 

1 Introduction 

Synthetic biology is an emerging scientific field combining the investigative nature of biology with the 
constructive nature of engineering ll22l to design synthetic biological systems. The issue is to devise 
new functionality/behaviour that does not exist in nature. Then, the field of synthetic biology is looking 
forward principles and tools to make the biological devices inter-operable and programmable [ 19 ]. Syn- 
thetic biology projects were first focusing on the design and the improvement of small genetic devices 
comparable to logical gates for electronic circuits lf23l ITTTl . Recently, projects have attempted to develop 
large bio-systems integrating different devices with as a long-term goal, the design of de-novo synthetic 
genome lfl6l . In this endeavour, the computer-aided-design (CAD) environments play a central role by 
providing the required features to engineer systems: specification, analysis, and tuning Hl l20ll25l[T2l . 
Pioneer applications show the valuable potential of such environments in IGEM competition. 

Currently, the design specifies the structural assembly of DNA sequences (biobrick) as in GENO- 
CAD 0. Although this description is indispensable to provide a finalized specification of devices, the 
abstraction level seems inappropriate for tackling large bio-systems. The required size of programs for 
sequence description likely makes the task error-prone and un-come-at-able. In the same way as large 
softwares cannot be programmed in binary, large biological systems cannot be described as aDNA se- 
quence assembly. Then, scaling up the complexity of the synthetic biological systems needs to complete 
the structural description by an additional abstract programming layout based on a high-level program- 
ming language and harness the automatic conversion of the design specification into a DNA sequence, 
like compilers. High level programming language for synthetic biology is announced as a key mile- 
stone for the second wave of synthetic biology to overcome the complexity of large synthetic system 
design [22]. Nonetheless, in this domain, language technology is still in its infancy and transforming this 
vision into a concrete reality remains a daunting challenge. 

Such high-level language should describe the devices in term of functionalities, offering the ability 
to program the behaviour directly instead of the structure supporting this behaviour. Indeed, behaviour 
specification contributes to accurately document the device by adding its behavioural description, to 
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assess its functionality automatically and formally, notably by generating test-benches from this spec- 
ification, and to get a relative independence to technology because different biological structures can 
carry out the same functionality. In this framework, the components are selected and organized automat- 
ically or semi-automatically to generate a structural description of the device at compile phase whose 
behaviour complies with the specified function. A such approach has been already achieved in hardware 
by using languages as VHDL HI or VERILOG ll24ll to overcome the growing complexity of electronic 
circuits. However, the major difference in synthetic biology relates to the openness of biological sys- 
tem. Thereby, the issue is to propose a behavioural language for open systems. More precisely, GUBS is 
a rule-based declarative language dedicated to the behavioural specification of discrete open dynamical 
systems for synthetic biology interacting with its environment. GUBS symbolically defines the behaviours 
to provide a relative independence from structures by postponing the biological component selection at 
compile phase. Within this framework, the compiler translates the behavioural specification to a struc- 
tural description of a device whose behaviour carries the functional features defined by a program. The 
proposed compilation method is inspired by automated theorem proving. 

After introducing the main features of GUBS language (Section[2]), we define the semantics of GUBS 
based on hybrid logic. Then, we detail the proof-based principles governing the compilation (Section [3]) 
illustrated with a complete example (Section [4]). After a survey of the related works, Section [5j we 
conclude (Section[6]). 

2 GUBS language 

In this section, we describe the main features of GUBS. 

Constant and variables. In GUBS, two kinds of objects are distinguished: the constants and the vari- 
ables. The constants designate the pre-defined objects in a corpus of knowledge. In biology, the constants 
may refer to proteins or genes of interest. For example, the agent LacZ refers to LacZ protein or gene. 
By convention, their name starts with a capital letter. The variables refer to an abstraction of these pre- 
defined objects and can be potentially replaced (substituted) by any constant. By convention, the variable 
names start with a minuscule letter. 

Agents, attributes and states. The agents represent the biological objects. Their different observable 
states characterize their different behaviours. The behaviours actually define the different capacities 
for actions on the state of the other agents. They are characterized symbolically by a set of attributes 
categorizing these different capacities. The real significance of the attributes is a matter of convention de- 
pending on the targeted realization {e.g., protein pathways, gene network) and will be addressed through 
examples. For instance, the regulatory activity of a gene is observationally related to thresholds of RNA 
transcripts concentration. At a given threshold, a gene regulates a given set of genes whereas at another 
one the regulation applies to another set of genes (See Figure [T]). The different thresholds define the 
levels of gene activities leading to different regulatory activities. For a gene G, if we identify three dif- 
ferent kinds of regulatory activities, the state of this gene will be defined by three different attributes 
{Low, Mid, High} that characterize symbolically three possible behaviours. For example, G(Low) ex- 
presses the fact that agent G is in state Low and then ready for the action corresponding to this attribute. 
In some cases, a single state is sufficient to qualify the capacity for the action of the agent. Hence, the 
agent is identified to its capacity. Then, G means that agent G is available. 
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By contrast, G(Low) signifies that the state of the agent differs from Low (G when an agent has 
a single capacity). It is worth to point out that, not being in a state defined by an attribute, does not 
necessarily means that the agent state is in another attribute. Indeed, for open systems the state of the 
agents could be of any sort that does not necessarily belong to the pre-defined attributes. 

Two kinds of relations on attributes are defined: an order, <, meaning "less capacity than" and an 
inequality, $, meaning "different capacity than". Then Low < Mid implies that the capacity for the action 
of Mid includes the capacity related to Low. Usually, in gene regulatory model lfl4l . the set of genes 
regulated at a given level will also be regulated at a higher level. By contrast, in signalling pathways, 
the phosphorylation of a protein induces a conformational change of the structure leading to a specific 
signalling potentiality not occurring in the unphosphosrylated conformation. Assuming that Phos and 
UnPhos respectively represents the phosphorylated and the unphosphorylated conformations of protein 
P, we have Phos $ UnPhos. Then, P(Phos) implies P( UnPhos) implicitly. The attributes and the relation 
between attributes will be declared as follows: G ■■ {Low < Mid, Mid < High},P ■■ {Phos $ UnPhos}. A 
simple set of attributes replaces the relations if unknown and no specific relation is set between attributes. 

Finally, the description of the agent state is extended to a collection of agent states as follows: gi + 
...+g n , meaning that all the agent states, gi, are observed concomitantly. 



Trace, event, and history. A GUBS program describes a behaviour, its interpretation is based on the 
observations of designed systems. Then, the issue is to formalize the notion of behaviour observa- 
tion. To this end, we focus on the notion of trace that symbolically represents the evolution of some 
quantities related to the agents of interest by the evolution of these agent states. A trace can be ob- 
tained from experiments by establishing a correspondence between measurements of some quantities 
{e.g., RNA transcript concentration) and attributes of agents. Formally, a trace, (7})i<,< m , is a finite se- 
quence of agent state sets where each set contains the agent states at a given instant. For instance, the 
evolution of a concentration evolving from Low to High for G may be described by the following trace 
of 6 instants: ({G(Low)},{G(Low)},{G(Mid)},{G(Mid)},{G(Mid)},{G(High)}), . However, all the events in a 

1 2 3 4 5 6 7 

trace are not necessarily relevant with regards to the behaviour description. For example, if we focus 
on the evolution from Low to High for G , only three events are relevant for the behaviour description: 
G(Low),G(Mid),G(High); without accounting the intermediary evolution stages occurring between. 
Then, the behaviour recognition always emphasizes the key events in a trace entailing its contraction to 
show their succession. Such a contracted series is called a consistent history of the expected behaviour. 
Generally speaking, an history is related to a chronological division of a trace into periods where the 
events of a period represent all the agent states occurring at each instant. Then, an history is a sequence 
of these event sets. Given a trace (T t )o< t < m , and a chronological division, (di)\<i< n , corresponding to a 
sequence of the starting dates for each period, the history is a sequence of agent states occurring in each 
period, (//,-) i<,-<,j, such that each//,- = \Jdi<t<d i+[ Tt- Hence, a consistent history is purposely made to point 
the characteristic event steps of a behaviour description out. 

In the previous example, a chronological division^] of the trace leading to an history consistent with 
the expected evolution from Low to High for G is (1,3,6,7) which corresponds to following discrete 
time-intervals ([1,2], [3,5], [6,6]). The resulting history is: ({G(Low)},{G(Mid)},{G(High)}). Notice that 
(1,2,4,7) also fits. However, the chronological division (1,3,7) leads to an inconsistent history because 
the level Mid is not seen as an intermediary event in the history (See also Figure [T] depicting the trace 
and consistent history of the dependences). The formal definition of the consistency in the scope of the 



semantics will be given in Section 2. 1 

1 Step 7 is inserted as an extra step to comply with the definition of the chronological division. 
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Figure 1: The curves represent the typical behaviours of the causal dependences based on the time 
evolution of a quantity (q) related to agents c and e {e.g., RNA transcript for gene regulation). The 
symbolic agent states c and e are here both associated to the maximal threshold of the quantity. The 
symbolic trace (T) is issued from a periodic sampling of the evolution by identifying whether core 
occur. A consistent history (H) complying to a causal dependence definition is represented below the 
trace. The first graphic illustrates the normal causality: c o-> e, the second the persistent: c ©->■ e and the 
third the remanent one: c e. 

Behavioral dependence and observation spot. A behavioural dependence identifies a relation be- 
tween behaviours as a causal relation on events. Basically, the dependences should define the control of 
agents on another. However, the definition of the causality also needs to tackle the openness of a system 
by adapting it to this context. A seminal definition of the causality, proposed by Hume [tTTl . is formu- 
lated in terms of regularity on events: "[we may define] a cause to be an object, followed by another, 
and where all the objects similar to the first are followed by objects similar to the second". Although 
this definition appropriately characterizes the notion of control, the openness of the system implies to 
account the environment actions that possibly alter the causal dependence chain. For example, a pro- 
grammed activation Gi — ^> G2 may be contradicted by an existing inhibition G3 — > G2 addressing the 
same target gene G2. Hence, while G\ is active, it may appear that G2 will not be active because the reg- 
ulatory strength of G3 is greater than the regulatory strength of G\, contradicting the expected activation 
by a hidden inhibition. Hence, pushed to the limit, this consideration prevents the ability to describe any 
behaviour causally because any programmed action can be unexpectedly preempted by an external one. 

However, by assuming that the design always describes a new functionality which is not observed 
naturally, the effect becomes the event indicating the effectiveness of a causal relation. As no cause 
external to the description can trigger the effect, the over-determination by unknown causes is prevented, 
then insuring that the program is the sole device entailing the expected effect in the biological system. 
Hence, the definition of the causal dependence will be governed by the effect leading to the following 
definition of the dependence: "if effect e would occur then c occurs ". Moreover, the scope of future (resp. 
past) is narrowed to a closest future (resp. past) period, representing the fact that a response is always 
expected in a given delay. Notice that, the proposed definition circumvents the afore mentioned problem 
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illustrated by the hidden inhibition because if the effect does not occur the question of the existence of a 
cause is meaningless. This definition is somehow equivalent to the causal claims proposed by Lewis |[T8l 
in terms of counter-factual conditionals, i.e., "If c had not occurred, e would not have occurred". 

Three behavioural dependences are defined in GUBS: the normal denoted by o-*-, persistent by ©->, 
and remanent by Informally, for normal dependence the cause precedes the effect providing the 
effect is observed; for persistent dependence the cause still precedes the effect but it is maintained while 
the effect is observed; and for remanent dependence, the effect is maintained despite the cause has dis- 
appeared. These dependences symbolize common biological interactions. For instance, in genetic engi- 
neering, the recombination enables the emergence of a regulated gene or an hereditary trait permanently. 
A such mechanism typifies the remanent dependence in biology. The relations between gene expression 
at steady state are symbolized by persistent dependence. The behavioural dependences are defined as 
follows (see Section |2~T] for their formalization): 

• c 0-> e: if e occurs then c occurs in the closest past. 

• c ©-> e: if e occurs then c occurs in the closest past and also currently. 

• c ©-> e: if e occurs then, either e occurs in the closest past or the dependence complies to the 
property of the normal dependence. 

Figure[T]exemplifies the correspondence between experimental traces, symbolic traces and the history for 
the causal dependences. All the dependences are extended to a set of causes and a set of consequences, 
i.e., c\ + . . . + c n o-> e\ + .. . + e m . For example, let us define the activation and the inhibition as follows: 
81 gi = gi ©->■ g2,g\ 0-> g 2 and g [ — ► g 2 = g { g 2 ,gi g 2 , the program depicting a negative 
regulatory circuit with two genes, i.e.,g\ — > g 2 ,g 2 — ► gi, is: {gi ©-»> g 2 ,gi 0-+g 2 ,g 2 ®^ gug2 0-» gi}. 

The observation spots describe the set of observations expected in a trace. For instance, observ- 
ing that gene G is at level high is written Obs::G(High) . As the activation of a dependence lies on 
the observation of the effect, the observation spot is used to determine which effects must be necessar- 
ily observed. For example, in the negative regulatory circuit, the characteristic observation spots are: 
obsir.gi +g 2 ,obs 2 ::g l +g 2 . 

Compartment & Context. A compartment encloses a set of dependences making them local to the 
compartment. For instance, C{gi 0-> g 2 } describes a normal dependence occurring in compartment C. 
The compartments are hierarchically organized and all the compartments are included in another ex- 
cept for the outermost one. Although the compartments directly refer to the compartmentalized cellular 
organization (e.g., nucleus, mitochondria), they are also used to emphasize the isolation of some inter- 
actions by syntactically enclosing the dependences into a compartment. C.s refers to an agent state in 
compartment C. 

A context refers to a stimulus acting on the system, as environmental conditions or external sig- 
nalling. The application of a context c to a set of dependences b is written [c]b where c is either a 
variable or a constant. This means that dependences of b are triggered when the context c is present. For 
instance, recently Ye et al. [26] explore the opto-genetics signalling to control the expression of target 
transgenes. The blue-light induces the expression of transgene (tg) via a signalling cascade leading to 
the binding of NFAT transcription factor to a specific promoter (PNFAT). The following program using a 
context summarizes the process: [BlueLight]{NFAT ©-»■ tg}. A context can be decomposed to several 
contexts, [k\,.. . ,k n ]b, meaning that all the conditions must be met to trigger the dependences of b. The 
interpretation is equivalent to a context cascading, [k\ ] [k 2 ] . . . [k n ]b. Moreover, the observation spots and 
the attribute definition are context insensitive. 
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2.1 Semantics of GUBS 

The interpretation of GUBS is a formula such that the set of all the models validating it defines all the 
possible histories complying to the programmed behaviour. The interpretation is based on multi-modal 
hybrid logic with the "Always" operator, T~L(A, @). 

Hybrid logic. In what follows, we recall the formal syntax and semantics of hybrid logic. The hybdrid 
logic [5,6] offers the possibility to denominate worlds by new symbols called nominate. They will be 
used in satisfaction modal operators @ a ; the formula @ a asserts that is satisfied at the unique point 
named by the nominal a identifying a particular truth values of a formula at this point. Given a set of 
propositional symbol, PROP, a set of relational symbol REL, and a set of nominal NOM disjoint to 
PROP, a set of well formed formula in the signature of (PROP, NOM, REL) is defined as follows: 

::= T \p | a \ -.0 | A0 | @ a | (fc)0 \ (&)~0 \ A(j>. 

with p e PROP, a e NOM and k e REL. Moreover, the syntax is extended to other logical operators 
classically^} l,v,-+,[/c],E. 



The interpretation is carried out using the Kripke model satisfaction definition (Table 2. 1 ). M., w lh 
is interpreted as the satisfaction of a formula by a model M. at world w where lh stands for the 
realizability relation (i.e., "is a model of"). A model validates a formula, denoted by M. lh 0, if and only 
if it is satisfied for all the worlds of the model (i.e. , Vw e Dom M. ■ M. , w lh 0). 

Definition 1 (Kripke model). A Kripke model is a structure M. = (W, (Rk)ket,V) where W = Dom M. is a 
non-empty set of worlds, T £ REL a subset of relational symbols denoting the modalities, Rk £ W x W, k 6 T 
a relation of accessibility, V ■ (PROPu NOM) ->■ 2 W an interpretation attributing to each nominal and 
propositional variable a set of worlds such that any nominal addresses one world at most (i.e., Ma e 
NOM:\V(a)\<\). 

By convention, R stands for the union of the accessibility relation, R = (U/ter^/t)- 

A modal theory of a model M. regarding to a set of formulas F , THp(M), is the set of formulas of 
F validated by M, i.e., TH P (M) = {0 $F \ M lh (j)}. KS(0) denotes the set of all models validating 0, 
i.e., YS(§) = {M\M\v§\. 



M,w lh T iff true 

M,w\\-a iff weV(a), ae NOMuPROP 

M,w\\- -n<j> iff M,w\\f 

M,w lh 0i a 02 iff M,w\\- 0i and M,w lh 02 



M,w\\- @ a iff 3w' e W : M,w' lh and {>/} = V(a) 
M,W\\- (k)(j) iff 3w' e W :M,w' lh and wRkw' 
M,W\\- (^)"0 iff 3w' € W : M,w' lh wAw'RkW 
7W,wihA0 iff Vw' e W :M,w' lh 



Table 1 : Hybrid logic interpretation. 



Semantics. A GUBS program is interpreted by a hybrid logic formula where the modal operators 
characterize here the temporal observations on an history: [ ] means "observed in all the closest fu- 
tures" and ( ) means "observed in a possible closest future at least" (resp. ( )~,[ ]~ for the closest 
past). Moreover, we assume that the accessibility relations, (Rk)k<=r-> are indexed by the non empty 



2 1 = -.T, yV0 = A-.0), Iff^ = -.( I^A-.0),[fc]0 = -^(k)^(j),E(j) = 
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parts of the set of all the contexts of a program P, denoted by Kp {i.e., X = 2 Kp \ {0}). Then, a non- 
empty set of contexts ,0 c K c Kp, is a modality, i.e., (K),[K] with ( } = (0} by convention. Let 
(W, »,A) be the set of words W with the concatenation operation and the neutral element, the empty 
word A and F% the set of well-formed formulas of 7i(A, @), the semantics is defined by four functions: 
[.]: P - F H , IJ P : P - W -> 2 W - F H , [.} B : B - W - F«, : R - W - F w , where P, B, R respectively 
stand for the set of GUBS programs, the set of agent state set and the set of relations on attributes. [.] 



initiates the interpretation. Table 2.1 defines these functions. For instance, the program of the negative 



[{*}] 


= A( [ft], (A)(0)) 




le] P (C)(K) 


= T 




[bub 2 Jp{C)(K) 


= W P (c)WAfe 


] F (C)(K) 


lsiO+s 2 J P (C)(K) 


= fcU c )^W ( 


bi] B (c)) 




= Mj(c)-(['i]U 




hi^s 2 } p (C)(K) 


= M B (c)-(«)- 


M fl (C))v((A-)-[i 1 ] 1> (C))) 


igi,-,gn ■ {n,-;r m }Jp 


(C)(K) = A; ! =1 A7 =1 h] s (C. 


?/) 


ll::sjp(C)(K) 


= @/M B (c) 




lC{b}] p (C)(K) 


= M P (c.c')W 




l[K]{b}J P (C)(K') 


= p>] P (C)(tfu£') 




[ Sl + ... + s n j B (C) 


= AtiMa(C) 
= [*] B (C.C) 




[c'-4(c) 




k(«)] fl (c) 


= C.g fl 




k(s)] fl (c) 






kl B (c) 


= Cg 






= -C.g 




[ai <fl2]«(g) 


= go 2 -»■ ga, 




[ai 96a 2 ]«U) 


= gai -* ^ga 2 Aga 2 -»• 






= T 





Table 2: Semantics of GUBS. In the definition, a represents an attribute, b a behaviour, g an agent, s a set 
of agent states or an agent state, r a relation on attributes, C a compartment, K a set of contexts and b a 
set of behaviours (i.e., contexts, compartments, dependences, attributes, observation spots). 

regulatory network, {gi 0-» g 2 ,g x g 2 „?2 g\,g 2 gi,ofoi - gi + g 2 ,obs 2 ■■ g { +g 2 }, is translated 
into the following formula: 

A ( £2-"(« r^l)Agl)A^ 2 ^« )~-glWl r- , 52)A-.g 2 )A-.gl->« }~g 2 )A 

@0&s, (§1 A -#2) A @ / w ,(^l Ag 2 ) 



Consistent history. Now, we formally define the consistency of the history with regards to models. 
An history is assimilated to a path in a model ending by a world labelled with an observation spot label. 
The set of Kripke-models validating the interpretation of a program P, KS ([/>]), not only contains all 
the consistent histories, but also the possible histories corresponding to behavioural alterations due to 
external perturbations. Thus, the compilation generates a device such that all the models validating 
its interpretation integrate all the observations related to the program, including the consistent and the 
inconsistent ones. 

More precisely, the consistency lies on the identification of the largest number of "relevant" events 
characterizing a complete causal chain described in a program. As an history is also a model, a consistent 
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history should validate the interpretation of the complete causal chain. The dependence formula set Fp 
of a program P corresponds to a set of formulas where each formula is the interpretation of a dependence 
taken separately with the attributes related to the involved agents. By definition of the semantics, any 
model validating the interpretation of a program also validates each formula of this set. The consistency 
of an history is then based on the validated formulas of this set by this history. An history M.h is 
consistent for P if and only if no other modal theory of histories based on Fp (i.e., THp p (Ai) with 
M. as an history), ending with the same labelled world includes the modal theory of this history (i.e., 
TH Fp (M H )$TH Fp (M)). 

3 Compilation 

At compile phase, a program is transformed to a structure (e.g., a DNA sequence) while inserted in a 
vector cell, should behave according to the programmed specification. The structure will result to an 
assembly of several devices stored in a library of components (e.g., parts registry). As the design relates 
here to a behavioural/functional description, we need to bridge the gap between structural and functional 
description. This stage is called the functional synthesis. The issue is to select a set of components whose 
assembly preserves the behaviour of the program. To achieve this goal, a GUBS program is associated to 
each component to describe its behaviour. Thereby, the component assembly corresponds to a program 
assembly preserving the behaviour of the compiled program. Preserving a behaviour is laid on a property 
called the behavioural inclusion formalizing the fact that the characteristic observational traits of the 
specified function must be recognized in traces related to the device experiments. In other words, we 
can exhibit histories consistent with the programmed behaviour from histories consistent with the device 
behaviour description. The behavioural inclusion is defined from the interpretation of the programs, as a 
logical consequence (Definition[2]). 

Definition 2 (Behavioral inclusion). A program Q behaviourally includes another program P, if and only 
if the interpretation of the latter is a logical consequence of the interpretation of the former: 

P\eQ± MM ■■ M lh [<2] => M ih [Pj . 

The behavioural inclusion is a pre-ordei[^]such that the empty program, denoted by e, is a minimum, 
meaning that a program with no behaviour can be observed in all traces. And a program whose inter- 
pretation equals 1, is a maximum. Figure [2] illustrates the behavioural inclusion on a particular model. 

Observability. It may arise that no history will be consistent with a programmed behaviour. For ex- 
ample, the program {Obs ■■ g,g ©-»■ g} is not observable in a trace. Indeed, its interpretation yields to the 
following formula: A((@obsg) A (g -*■ ((( } ~\?) A ^v?)))* f a l se m au models because world Obs must 
both satisfies g and -<g by definition of the persistent dependence. A GUBS program is said observable 
if and only if the formula resulting from its interpretation is validated by one model at least. Hence, the 
interpretation of an unobservable program is an antilogy. An unobservable program can be assimilated to 
a programming error. The detection of such errors can be carried out at compile-phase by using tableaux 
method [9] that automatically determines whether a formula is satisfiable in a model. Indeed GUBS uses 
fragment of HL( @ ) logic which is decidable. Notice that an observable program always behaviourally 
includes an observable program (Proposition [T}. 

3 A reflexive and transitive relation. 
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p 


Q 


{g\ &+g3, 


{[£l]{g0O^gi}, 


[h]{g3 o^g 4 }, 


fe]{g0O^g 2 }, 


[k 4 ]{g3C^g 5 }, 


gl 0-^g 3 , 




g2 0-^g 5 , 


[h]{g80->gw}, 


fe]{g3 0->g 4 }, 


g9 0->gll, 






g6 0^g8, 


a::g 4 } 


fe]{g6 0^g7}, 




M{g8 O^gg}, 




[*7]{g8 gio}, 




g9 CM-gn, 




glO CH-gn, 




a :: g4,fr :: g5, c: gll} 




Figure 2: Behavioral inclusion example. Consistent histories of P necessary contains worlds coloured in 
gray. 



Proposition 1. A program behaviourally included in an observable program is observable: VP, Qe P: 
obsQ/\PEQ => obsP. 



3.1 Functional synthesis 

The functional synthesis is the operation whereby biological components of a library are selected and 
assembled to generate a device behaviourally including the designed function. The behaviour of each 
component is described by a GUBS program. At its simplest, the functional synthesis could be considered 
as a proper substitution of variables by constants. For example, in the following activation {G\ — ^ g2], 
g2 will be substituted by gene G2, providing that component Q describes the activation {Gi —*■ G2}. 
However, more complex situations may arise during component selection. For example, if the activation 
Gi —*■ G2 occurs with another regulation only i.e., Q = {G\ — ^> G2,G3 — ^> G4} then the selection of Q 
adds a supplementary regulation. 

Formally, a finite substitution is a set of mappings, a = {vi/bi}j, on variables and constants such that 
a variable can be substituted by a variable or a constant, and a constant can only substituted by itself 
For instance, we have: {Obs::G(l) + &2,^i O-*- G{l)}[{by h> B\,b% h> B2J i-> Low}] = {Obs::G(Low) + 
B 2 ,Bi O^G(Low)}. 



Functional synthesis rules. The functional synthesis is defined by rules (Table [3]) governing the com- 
ponent assembly. Only the dependences and the attributes will be functionally synthesize. The observa- 
tion spots are considered as annotations used for the compilation process. To insure the correctness, each 
transform must preserved the seminal behaviour. Hence, each program resulting from the application 
of a rule must behaviourally includes the previous one. Formally, the functional synthesis is modelled 
by a relation on programs denoted by i.e., Q(— a P where P is the initial program and Q the trans- 
formed one, such that each rule insures that: Q (— a P is correct with regards to a substitution a, that 
is P[c7] E Q[o] and Q[o] is observable. Also notice that the behavioural inclusion is preserved by 
substitution (Proposition [2]). 

Proposition 2. For all substitutions a, we have: PeQ ==> P[o] E Q[o]. 
4 Pa or P[cr] represents its application on program P and identity substitutions are omitted. 
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Table [3] describes the functional synthesis rule^] T is a set of components representing the library. 
^^Asm Q denotes the fact that program Q corresponds to an assembly including P i.e., Q = (<2i,P,<22) 
where Q\ or Q2 may be an empty program. Rule (Inst.) describes the fact that an observable instance of a 



- Instantiation - 

6[cr]g Asm P[g] obs(g[g]) QeT 

Q^oP 

- COMMUTATIVITY, CONTRACTION - 



(Inst.) 



q^cp,p' . q^op t . 

— (Com.) (Cont.) 



Q^oP',P Q^aP,P 

- Assembly - 

Qt-oP Q'<-<t>P' ^lvA(P)nVA(P') = CT 'lvA(/>)nVA(P') obs(Q[o],Q'[(j']) 

Q,Q'^ouc'Py 



(Asm.) 



Table 3: Functional synthesis rules 

part of a component in the library is functionally synthesized. Rule (Com.) expresses the commutativity 
of the assembly. Rule (Cont.) contracts the redundant formulation of programs. Finally, Rule (Asm.) 
details the conditions for an assembly of two components, each representing a functional synthesis of a 
part of the designed function. A detailed example of their use on a real case is given in Section [4j 

Theorem 1. The functional synthesis rules (Table^ are correct. 

- DEPENDENCES - g ^ 5 ' ^ S ^2 ^ S 3 ,A Q^S^S 2 ,A Q^ aSl O+S 2 ,A 

e^5,^5 3 ,A (iranS ' ) Ge ff Si Oh. 5i,A (N2K) S r S 2 ,A (K2N ° 



AGENT STATES 



(SCom.) — (SCont.) — - — (Incl.) 



S 2 + 5i S + s + s 



Table 4: Rules for the dependences and the agent states. 5, stands for a collection, s\ + ... + s„, of agent 
states, including negation, and A stands for the rest of the program. 

Another set of rules, more specifically devoted to dependences (Table [4]), defines the alternate pos- 
sibilities to express similar behaviours. The table also includes the rules for agent sets. Rule (Trans.) 
expands the chain of the persistent dependences by adding intermediary dependence to refine a pathway. 
Rule (N2R) transforms a normal dependence to a persistent one since the latter is a normal dependence 
with an additional property. And Rule (R2N.) transforms a remanent dependence to a normal depen- 
dence, since normal dependence is also remanent dependence with a repetition of the effect restricted 
to one step. According to these rules, all the dependence chains can be implemented with persistent 
dependences. 

A possible algorithm for the assembly could be based on a combinatorial application of the rules. 
However, such algorithm may reveal inefficient in practice. The conditions for an efficient algorithm of 
compilation should be based on an internal representation of a program, as a set of contextualized de- 
pendences with attributes, {{A, [K]S\ ®-> S2}}, such that A,K,Si,$2 are respectively: a set of attributes 

5 „ hypothesis 

Rules are of the form: . 

conclusion 
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specification related to the agent involved in the dependency, a set of contexts and sets of agent states. 
Any program can be encoded under this representation from a normal form of the program (not de- 
tailed here). Accordingly, the problem solved by the compilation algorithm can be defined as follows 
(Definition [3): 

Definition 3 (Functional Synthesis Problem). Let T = {Qi}\<i< n be set where each Qj is a set of contex- 
tualized dependences with attributes and P a set of contextualized dependences with attribute, can we 
find the smallest observable subset of components C £ T, such that there exists a substitution o so that 
its application on the components of C form a cover ofP[o],i.e., 3a : P[o] £ Ug ; ec£2./[ (7 ] ^obsC. 

As the set cover problem is reducible to this problem, the problem is NP-complete. Then, the reso- 
lution is oriented towards a heuristic algorithm. 

4 Example 




Figure 3: The band detector regulatory circuit. 



The compilation process is here exemplified in a real case by the design of the Band Detector pro- 
posed in 0. This example explains how from a simple abstract definition of the functionality a complex 
design can be synthesized. Accordingly, GUBS may be used to describe a behaviour with a high-level of 
programming well as a low-level, detailing the components involved in the process. Although, the func- 
tional synthesis is not yet performed automatically, it is worth to point out that the different transforms of 
the high-level program to obtain the final design complies to rules of Tables [3j |4j insuring its correctness 
and so, its functional safety in the context of open system. 

The design aims at forming patterns of different colours in a population of bacteria exploiting the 
quorum sensing phenomenon by staining with fluorescent protein (GFP). The amount of molecules of 
interest that receives a cell depends on its relative position to the cell diffusing the molecule of interest 
controlled by an external event: more the cell is far from the source, the fewer is the amount of molecules 
received. The activation or inhibition of the fluorescent protein due to the concentration will distinguish 
the bands surrounding the source. In the original design, the protein does not fluoresce in an intermediary 
band. 

From a computing standpoint, we can assimilate the design to a message transmission coupled to a 
sensor/actuator responsible for fluorescence, then leading to a concise GUBS program presented below: 
the diffusive molecule is AHL which production is controlled by a context and the observation is applied 
on GFP. Two categories of cells are defined: the Sender and the Receiver. Therefore, two GUBS programs 
identify the two cell types. 

Sender ={ AHL:{/ow $ mid * high}, [Light] {detect O-* AHL(/ow), detect O-* AHL(mW), detect O* AHL(high)}} 
Receiver={ AHL(Zow) 0-» GFP, AHL(mM) GFP,AHL(/«gfc) <y* GFP,ofoi::GFP,oi>s 2 »GFP} 
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Figure [3] describes the original genetic circuit used in the article. The diffusible molecule is the 
constant AHL. The gene LuxR has three activation thresholds: at Level 2, it activates both LaclMl and 
CI, at level 1 , the amount of AHL only allows activation of CI, and finally, at level 0, none are activated. 
We show that from the sender-receiver program, we obtain the original design by applying the afore 



Qi={ [Light] {detect O* Tetr}} 
g 2 ={Tetr^ Luxl} 

2 3 ={AHL:{/ow # mid * high}, Luxl AHL(/ow), Luxl AHL(mirf), Luxl -^U AHL(fo'gft)} 

g 4 ={AHL:{W * mid $ high}, LuxR:{/ow # < high}}, AHL(mid) 0-> LuxR(mW), AHL(Wg^) o-> LuxR(/n'gft)} 

g 5 ={LuxR:{/ow * {mid < high}}, LuxR(mi'rf) CI, LuxR(WgA) CI + LaclMl} 

g 6 ={CI^Lacl} 

g 7 ={LaclMl^GFP} 

g s ={Lacl^GFP} 



Table 5: Part of the database dedicated to the Band Detector. 

mentioned rules with an appropriate selection of components. The regulations of Figure [3] are described 
in GUBS program (Table [5]) translating in term of dependences and relations on their attributes their 
regulatory action. We focus here on some illustrative steps of the sender program compilation. The 
complete functional synthesis is given in Appendix. The compilation consists in finding the appropriate 
components whose assembly behaviourally includes the sender-receiver program, with the particularity 
that the diffusive molecule must be the same in both programs. To ease compilation follow-up, we label 
each dependency of the sender-receiver program (Table [6]). Let us consider P\\ whose compilation is 
closed to P\2 and P13. Notice that P\\ cannot be directly instantiated with any component because, in the 
one hand, the component Q\ contains a context like P\\ but applied on gene Tetr instead of AHL, and on 
the other hand Qi has the AHL molecule but no context is defined. So, to fit Pi i with the components Q\ , 
Q2 and Q3, first, the normal dependence is converted to persistent one (Rule (N2P.)). 

61,02,03 <-<r {[Ught]{detect 0^ AH L(low)}} 

\W Zr. } 



Ql,Q2,Q 3 ^ePn 

Thereby, the resulting dependence can be separated to match the assembly Q\, Q2,Q}, by applying 
(Trans.) rule twice, vi and V2 are fresh variables. 

61, 02,23 P'n = {[light]{detect GH-v 2 ,v 2 vi,vi ©-* AHL(low) } 

— (Trans.) 



61,62,63 *-<j [light]{detect ©-> Vi,vi ©-+ AHL(low)} 

■ (Trans.) 

Si. 02, 03 «— ct [light]{detect &-* AHL(low)} 



Finally, we obtain a new program program P' n compatible with <2i,G2,<23> an d ea °h variable is substi- 
tuted by a constant (biological element) with the application of Rule (Inst.). For P' n we have: 

6i , 02, 03 [a = {Iigkt/Ught,vi/Tetr, v 2 /Luxl}] c Alm p> [a] obs(Qi ,Q 2 ,Q 3 [a] ) 

(Inst.) 



2i ,62,63 [light]{detect ©-> Vi,vi ©-> i>2,i'2 0-» AHL(low)} 

By following this scheme for P\2 and P13, we respectively obtain P[ 2 and P' l3 . The final assembly corre- 
sponds to the functional synthesis of Sender program. 

61,62,63 ^cP'n 61,62,63 f-ff^k 61,62,63 (-<yP 13 
Oi, 02,03 (-oPii QuQ.2,Q}^cPn Q\,Qi,Q3 *- CT «Pi 3 



61,62,63 <r-ou<i>ua» Pn,Pn>PK 



(Asm.) 
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Sender 


Receiver 


Pu = {[Light]{detect O-.- AHL(Zow)}} 
Pn = {[Lr'gtojfctoert 0-> AHL(ma'd)}} 
P 13 = {[Light\{detect O* AHL(high)}} 


P21 = {AHL(/ow) O* GFP} 
P22 = {AHL(mid) &->• GFP} 
P23 = {AHL(«g/!) GFP} 



with {AHL:{/ow mid t/i high}} as attributes oiAHL. 



Table 6: Separation of the dependences. 

In conclusion, the functional synthesis generates the original genetic circuit (Figure [3} from the sender 
program. A similar approach can be also applied to obtain the receiver program (see the complete proof 
in Appendix [6]). 

Sender = {AHL:{/ow * mid * high}, [Light] {detect O-* Tetr}, 

Tetr^ Luxl.Luxl -U AHL(W),Luxl -U AHL(mW),Luxl AHL(«g/i)} 

5 Related works 

Several domain specific languages have been developped to model and simulate biological systems. 
Based on process-calculus, seminally used to model process concurrency, several rule-based languages 
model protein interactions fi 2Tl [TBI [Toll . Another approach is based on logic, such as BIOCHAM [8] that 
formalizes the temporal properties of a biological system. As these languages are dedicated to simulation, 
the objective is to close the systems because the simulations need to integrate all the characteristics of 
the analysed systems. By comparison, the purpose of GUBS is different since the issue is to represent 
the behaviour of a synthetic device in an organism, leading to translate the notion of the openness of 
biological systems by the semantics of the language. 

In synthetic biology, the structural description languages lfl2l l20l |H allow to specify well-formed 
genome sequences by grammars modularly and hierarchically. Although the sequence description is 
necessary, the programmer must previously anticipate the behaviour of the device to conceive. Besides, 
the behavioural design is not included in the program while it initially motivates it. In GUBS, the design 
is driven by a behaviour description and sequence selection is postponed at compile phase. Moreover, 
the size of the structural description is also subject to a combinatorial explosion when the complexity of 
programmed systems increases. 

Amorphous programming language has been also investigated to specify the biological devices at the 
scale of cell colony, here considered as a possible computing medium for amorphous program. J. Beal [[3] 
demonstrates the proof of concept of this approach in PROTO, showing the feasibility of an automatic 
compile chain. In GUBS, the compile chain is based on rewriting rules whose correctness have been 
formally proved with regards to a semantics describing the constraints of an open system. 

Developing a language for biological systems actually involves to consider several unknown due to 
their openness: lack of knowledge on all the interactions in biological circuits and imprecise definition 
of initial conditions. We only know the result of a chain of effects. Then, the major constraint for 
programming open system seems to be: how to provide an expressive language to describe the dynamics 
of such systems, but simple enough to capture the essence of the biological questions in a small program 
in order to allow programming of large biological systems with a program humanly achievable. 

In the future, the design in synthetic biology will certainly require different programming layouts 
based on different paradigms addressing the integration levels of biological systems. In a tower of lan- 
guages, starting from a language with collective operations on cell colony, using an amorphous program- 
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ming language as Proto [3] or a language for dynamical systems with dynamical structures as MGS |fT5l , 
and ending by a structural description programmed in a grammar based language, GUBS language occu- 
pies the intermediary level dedicated to cell entity behavioural programming. 

6 Conclusion 

In GUBS language, we propose to characterize a programming paradigm abstracting the molecular inter- 
actions in the context of open system, that differs to an approach dedicated to biological system model- 
ing. Accordingly, the interactions are symbolized by causal dependences whose interpretation is driven 
by effect. We have demonstrated the proof-of-concept of the compilation based on rewriting rules, and 
illustrated it on a realistic example. The perspective of this work is to find an efficient compilation al- 
gorithm. Identifying the biological parameters guiding the component selection should be a key issue in 
this undertaking. 
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program 


:= {behaviour} 


behaviour 


:= behaviour, behaviour | behaviour 


behaviour 


:= compartment | dependence | context | observation | defattributes 


compartment 


:= varconstant {behaviour} 


observation 


:= varconstant::worlds 


context 


:= [varconstants] {behaviour} 


dependence 


:= worlds o-»- worlds | worlds ©-> worlds | worlds ffi-> worlds 


world 


:= attribute ] varconstant (attribute) | varconstant. world 


worlds 


:= worlds + world | world 


attribute 


:= varconstant | varconstant 


defattribute 


:= varconstants : attspec 


attspec 


:= defspec {varconstants} | {attrels} 


defspec 


:= exclusion | inclusion 


attrels 


:= attrels, attrel | attrel 


attrel 


:= varconstant < varconstant | varconstant j> varconstant | varconstant 


varconstant 


:= word \ Word 


varconstants 


:= varconstants, varconstant | varconstant 




Table 7: Syntax of GUBS program 



Appendix 

Proofs 

Proposition^ By contradiction, assume that P is unobservable, then there does not exist a model satis- 
fying the formula. As Q is observable, we deduce that there exists models satisfying Q, but no restricted 
model must satisfy P, that contradicts the definition of the behavioural consequence. □ 

Proposition 3. Let y e F n be a formula, let a : (NOMu PROPu REL) (A/OMu PROPu REL) be 

a substitution on nominals, variables and relational symbols, let Ai = (W, (Rk)kenV) be a model, we 
define the model Ai = (W, (Rk)ker > V) from Ai as follows: 

1. Va e NOMu PROP, Vw e W : w e V(ao) <=► w e V (a) 

2. V 'k e f : wRkaw' <==> wRkw'; 
we have: M,w ih yc M,w ih y. 

Proof. The proof is defined by induction on the formula: 

without loss of generality, we assume that \j/ is in Negation Normal Form where negation occurs only 
immediately before variables only. Recall that every formula can be set in Negation Normal Form. 

• M,w Ih a M,w Ih a, a e PROPu NOM. By {l}, we have w e V(aa) w e V(a) leading 
to the equivalence. 

• Ai,w II — >a <^^> A~i,w II — ^a. By definition of the realizability relation, this is equivalent to: 
Ai,w II/- a Ai,w uf a. By ([!]), this equivalence holds. 

• Ai,w Ih (Y\ a v/2)cj <=> Ai,w Ih (y/i a y/2). By definition of the substitution, we have to prove: 
Ai,w\h (Vicr) A (v^c) 7V4,w Ih (v^i a i//2). By definition of the realizability relation we can 
formulate the property equivalently as follows: 

M,w ih (y\o) aM,w ih (y2G) M,w\\-Yi/\M,w\\-y2- 
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By induction hypothesis, we have: M.,w lh (yci) < = :> M-,w lh y an d M.,w lh (yiG) <=> 
7V4,w lh 1^2, implying the previous condition. 

• .M,wlh(v/ivi//2)a <==>• .M,w lh (y/j v y 2 )- The proof is similar to the proof of the previous item 

(A). 

• M.,w lh (@ a y)<7 <=^ M,w lh @ fl y. By definition of the substitution we have to prove that: 
M,w\\- (@ a(T i/Aa) <==>• M,w lh @ a y By definition of the realizability relation, this is equivalent 
to: 

3w' e W : w e V(ao) aM,w' lh ya <=> 3w" e W : w" e V(a)a a7V4,w" lh y. 

By setting w' = w", from ([!]) we have: w' e V(ao) <==>• w' e By induction hypothesis, we 

have: M.,w' lh ya <^^> .M,w' lh y. The both last properties imply that: 

3w' tW :w tV(ao) /\M,w' u- yo <=> 3w' e W ■ w e V(a)a a 7W,w" lh y, 

which implies the initial property. 

• M,w lh ((k)\j/)a M,w lh (&)y. By definition of the substitution we prove that: M,w lh 
(kc)yc M,w\\- (k)Y- 

By definition of the realizability relation the condition is equivalent to: 

3w' e W ■■ M, w lh ya a wRkow' <=> 3w" e IV : 7W,w" lh i|/aic4w". 

By setting w' = w", the following equivalence holds from ([2]): wRk a w' <==> wPa-w'. By induction 
hypothesis, we have: M.,w' lh ya <==> 7W,w' lh y. The both last properties imply that: 

3w' nW ■ M,w' \\- yc /\wR ka w' <^> 7W,w' lh yA wR k w' 

which implies the initial property. 

• M., w lh ([k]\j/)a -<==> M.,w lh [&]y. The proof is similar to the previous item. 

• M lh (Ey)a M lh Ey. By definition of the substitution we prove that: M,w lh E(ya) <=> 
M,w lh Ey. 

By definition of the realizability relation, we have: 

3w e IV : M,w lh ( v^cr) .M,wlhy, 

which is directly verified by induction hypothesis. 

• M. lh (Ay) (7 <=> lh Ay. The proof is similar to the previous item. 

□ 

Pro/?os/£/o?i[2] First, let us remark that when P $ Q, the property is trivially verified. Besides, under 
the assumption P Is Q, if Q[o] is not observable the property is also verified because an unobservable 
program includes all programs behaviourally (Definition|2]). 

In the rest of the proof, we assume that P is behaviourally included in Q and Q[o] is observable 
(i.e., P E Q and obs<2[a]). Hence, by definition of the observability there exists a model M. such that 
M. lh [2[cr]]. By proposition [3] we deduce that there exists a model M. such that: M. lh \Q\. Moreover, 
as P E Q by hypothesis, there exists S £ Dom M. such that: M.§ lh [P]. By construction of A4 we 
deduce that there exists a sub model of M., denoted by M.' , complying to the properties, (HI and ^ of 
Proposition [3] which corresponds to M§. Moreover, we have M.' lh P[o] by Propositionj3] Then we 
conclude that: P[a] E Q[a]. □ 
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Theorem\T\ First, let us remark that P E Q is true whenever M \\f Q by definition of the behavioural 
inclusion (Definition [2]). Hence, the proof doesn't consider the trivial verified case but rather the case 
where M Ih Q. 

Inst. Directly from the definition of the behavioural inclusion (Definition [2]). 

Com. By definition of the semantics [P, P'J = A(0 A = A(>' a 0) = [P' , P] with [P] P = and [P' ] P = 
0'. Thus, for all we have: Ih [P,P'J Ih [P',P]. Hence, if Q n P,P' we conclude 

that: Q\sP',P. 

Cont. Similar to the proof of (Com.)- 



Asm. First let us remark that a 



VA(P)nVA(P') 



VA(P)nVA(P') 



means that the substitution of the common 



variables are the same for a and a', leading to, Q[guo'] = Q[o] and g'faucr'] = g'[cj']. Let 
a" = a u a'. Then, we have the following property by definition of the semantics (Table 2. 1 1 and 
a". 

MM e KS(l(Q,Q')[o"]]) ■ M Ih [Q[o]]aM Ih [G'[a']J . 

Notice that the set of models, KS([(2,2')[a"]]), is not empty since, by hypothesis, 

obs (Q[o],Q'[o']) holds. As Q (- a P and Q' (- a > P', any model validating Q (resp. Q') also 

validates P, (resp. P') by definition of the functional synthesis. Then, we deduce that: 

MM e KS([(Q,Q')[a"]j) : X ih [P[(j]]aM ih [P'[a']] . 

Then, we conclude that: 

MM e KS(l(Q,Q')[a"]}) : M ih [(P,P')[a"]] . 



□ 
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Table 8: Complete band detector compilation. 



